Skip to content

Release 2026.2.0#98

Merged
bencap merged 19 commits intomavedb-mainfrom
mavedb-dev
May 8, 2026
Merged

Release 2026.2.0#98
bencap merged 19 commits intomavedb-mainfrom
mavedb-dev

Conversation

bencap and others added 12 commits April 15, 2026 16:00
- Add `build_ref_identical_allele` and `translate_ref_identical_to_vrs`
  to `lookup.py` to represent HGVS `.=` variants as VRS Alleles with a
  `ReferenceLengthExpression` state
- Handle `.=` variants explicitly in `_construct_vrs_allele` and
  `_create_post_mapped_hgvs_strings` instead of silently skipping them
- Remove `len(hgvs) == 3` guards from validity predicates and early-exit
  checks; the reference-identical case is now handled directly
- Fix `is_intronic_variant` to return False for position-less variants
- Skip `vrs_ref_allele_seq` annotation for `ReferenceLengthExpression`
  alleles to avoid redundant and expensive full-sequence lookups
- Introduce `AlignmentQc` schema capturing per-alignment BLAT quality
  metrics (identity, CIGAR, mismatch positions, gap intervals) with
  in-memory-only position lists excluded from serialization
- Add `TargetMapping` schema representing a per-(target, layer) mapping
  row with variant counts, tool parameters, and preferred-layer flag
- Add `VrsMapResult` NamedTuple to pair VRS mappings with their
  `TargetMapping` rows from `vrs_map`
- Rename `annotation_layer` → `alignment_level` on `MappedScore` /
  `ScoreAnnotation` to align with new terminology
- Rename `ident_pct` → `percent_identity` on `AlignmentResult`; add
  `score`, `next_best_score`, `alignment_qc`, `aligner_parameters`, and
  `reference_assembly` fields
- Implement `build_scoreset_mapping` in `annotate.py` to assemble
  `ScoresetMapping` with populated `target_mappings` list, per-variant
  locus-quality flags (`at_mismatched_locus`, `near_gap`), and reference
  sequence metadata
- Restore canonical BLAT PSL scoring (`matches - misMatches -
  qNumInsert - tNumInsert`) in `_get_best_hsp`; previous BioPython port
  used raw identity count, causing noisy alignments to outrank clean ones
- Update JSON schema, API router, CI workflow, and README to reflect new
  output shape
- Add `test_annotate_target_mapping.py` and expand `test_align.py` /
  `test_annotate.py` with unit tests for new logic

Co-authored-by: Copilot <copilot@github.com>
…alignment_qc

- protein-vs-DNA (-q=prot -t=dnax) BLAT runs store target coords in
  nucleotide space and query coords in amino-acid space (3:1 ratio);
  minus-strand target blocks have ts > te, making seq[ts:te] return "".
  Comparison was crashing with ValueError from zip(strict=True); the
  per-base mismatch loop is now skipped entirely for this mode, setting
  mismatch_positions_unavailable=True so at_mismatched_locus is
  correctly left as None (not evaluated) rather than a false False.
  The preferred layer for protein scoresets is PROTEIN, flagged from
  the downstream protein-to-protein alignment, so no signal is lost.

- For nucleotide-vs-nucleotide runs, replace the bare zip(strict=True)
  with an explicit length-mismatch guard that logs a WARNING and falls
  through to zip(strict=False), preserving all mismatches in the
  overlapping prefix rather than crashing or discarding the block.
…s for better matching

This is mostly useful in multi-word target names where gene information is available but not in the first word of the target name.
…and adjust related annotations

Co-authored-by: Copilot <copilot@github.com>
…-identical-vrs

feat(vrs_map): add VRS mapping support for reference-identical variants
@bencap bencap self-assigned this May 4, 2026
bencap added 2 commits May 5, 2026 11:37
…-generated-unnecessarily

Fix protein layer generation and error handling in vrs_map
…or-visibility

feat(logging): improve error visibility and logging across application
@bencap bencap changed the title Release 2026.1.3 Release 2026.2.0 May 5, 2026
bencap and others added 3 commits May 6, 2026 11:41
…vel-mapping-metadata

feat: Target level mapping metadata
- Exclude `notebooks/` directory from linting (1,299 noise errors)
- Remove deprecated `ANN101` rule from configuration
- Fix invalid noqa directive format in vrs_map.py (`:` vs `. `)
- Update deprecated rule codes: `ASYNC101` → `ASYNC221` in main.py
- Migrate `str + Enum` to `StrEnum` in schemas.py (Python 3.11+)
- Fix subprocess S603 noqa placement in align.py
- Wrap `NamedTemporaryFile` in context manager (SIM115)
- Fix operator precedence with parentheses in annotate.py (RUF021)
- Sort `__all__` exports in lookup.py and mavedb_data.py (RUF022)
- Add `# noqa: A004` for unavoidable `map` builtin shadowing
- Auto-fix pytest decorator style: `@pytest.fixture()` → `@pytest.fixture`
@bencap bencap marked this pull request as ready for review May 8, 2026 16:53
@bencap bencap merged commit 0adc019 into mavedb-main May 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce a mapping table at the target level Mapping Output for Reference Identical WT Variants

1 participant